Unveiling community structures in weighted networks 
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Random walks on simple graphs in connection with electrical resistor networks lead to the defi- 
nition of Markov chains with transition probability matrix in terms of electrical conductances. We 
extend this definition to an effective transition matrix Pij to account for the probability of going 
from vertex i to any vertex j of the original connected graph G. Also, we present an algorithm 
based on the definition of this effective transition matrix among vertices in the network to extract 
a topological feature related to the manner graph G has been organized. This topological feature 
corresponds to the communities in the graph. 
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1. INTRODUCTION 

Network modeling is becoming an essential tool to 
study and understand the complexity of many natural 
and artificial systems [l[ . Applications @, H, Q include 
technological networks as the Internet, World Wide Web 
and electric power grid; biological networks as metabolic 
[E IE Q and amino acid residue networks [1, d, [n| EH ; 
and far more studied, social networks. This understand- 
ing firstly passes through the analysis of their topological 
features, usually related to complex networks. Examples 
are the degree distribution P(k), average degree (k) , clus- 
tering coefficient C, the "betweenness" of a vertex i and 
"assortative mixing" describing correlations among ver- 
tices in the network. 

Nowadays, an important research issue within com- 
plex network (graph) field is the study and identifica- 
tion of its community structure, a problem also known 
as graph partitioning. Many definitions of community 
are presented in the literature. In essence, this amounts 
to divide the network into groups where vertices inside 
each group share denser connections among them when 
compared with connections across any two groups. The 
main concerns in proposing methods to find communi- 
ties are in developing well successful automatic discovery 
computer algorithms and execution time that can not be 
prohibitive for large network sizes n. 

More recently various methods have been proposed to 
find good divisions of networks [12, EH- In particular, 
some techniques are based on Betweenness measures [3] , 
resistor network [l5| , Laplacian eigenvalues 0, E3] , im- 
plementing quantitative definitions of community struc- 
tures in networks [l8[ or through out benefit functions 
known as modularity [3, Gil- Those methods discover 
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communities in time runs that typically scale with the 
network size as 0{n 3 ) or even C(n 4 ). However, there is a 
proposal that scales linearly in time but needs a param- 
eter dependent considerations [l5j]. This method views 
the network as an electric circuit with current flowing 
throught all edges represented by resistors. The auto- 
matic community finding procedure is hampered by the 
need of electing two nodes (poles) that lie in different 
communities and defining a threshold in voltage spec- 
trum. 

Here we show how random walkers on graphs, also 
in connection with electrical networks, unveil the hier- 
archies of subnetworks or the so called community struc- 
ture. Our method combines Laplacian eigenvalue ap- 
proach with electrical network theory. A brief review of 
how the spectral graph theory can characterize the struc- 
tural properties of graphs using the eigenvectors of the 
Laplacian matrix, related to the adjacency matrix, has 
been presented by Newman [161 ]. 

The main aspect of the method relies on a generaliza- 
tion of the usual transition probability matrix P. The 
matrix element means the probability for a walk 011 
a weighted graph at i to its adjacent vertex j. The in- 
terpretation of conductances, the inverse of resistances, 
among any vertices leads to the definition of an effective 
transition matrix that accounts for hops on the graph. 
Defining a similarity matrix as a function of the effec- 
tive transition matrix elements it is possible to extract 
a topological feature related to the manner graph G has 
been organized. It turns out that this topological feature 
corresponds to hierarchical classes of vertices which we 
interpret as communities of the network theory. 

To explain our method, we present the essential of the 
spectral analysis of Laplacian matrices in Section 2. In 
Section 3 we present the arguments leading to the simi- 
larity matrix that sets a scale to extract the community 
structure. In Section 4 we describe how to implement the 
algorithm and show the results for the karate club net- 
work studied by Zachary [lp and for the model designed 
by Ravasz and Barabasi [201 ] , an example of network with 
scale-free property and modular structure. Section 5 con- 
centrates our discussions on weighted graphs and the final 
Section 6 contains our conclusions. 
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2. LAPLACIAN EIGENVALUES AND 
TRANSITION MATRIX 

Let us consider a simple graph G, i.e., undirected and 
with no loops or multiple edges, on a finite vertex set 
V = {1,2, ■■■,n} and edge set E, represented by the 
adjacency matrix A. The degree fej for each vertex i is 
obtained from the adjacency matrix A as fc, = Y^j=i ^ij- 
For non- weighted graphs, the symmetric nxn adjacency 
matrix takes values Aij = 1 , if there is an edge connecting 
vertices and otherwise. Thus, fe, counts the num- 
ber of edges that connect the selected vertex i to other 
vertices. This extends naturally to weighted adjacency 
matrix but we leave its version to Section 5. 

For our purpose we study the graph G through a posi- 
tive semidefinite matrix representation. This is achieved 
in the usual manner using the Laplacian. The Laplacian 
matrix of a graph G on n vertices, denoted by L(G), is 
simply the matrix with elements 



if i = j 

if i and j are adjacents 
otherwise , 



(1) 



which corresponds to the degree diagonal matrix minus 
the adjacency matrix, L = K — A. The Laplacian matrix 
has a long history. It was introduced by Kirchhoff in 
1847 with a paper related to electrical networks [2l| and 
consequently is also known as Kirchhoff matrix. 

The Laplacian matrix is real and symmetric. More- 
over, L is a positive semidefinite singular matrix with 
n eigenvalues Ai and eigenvectors v,. If we label the 
eigenvalues in increasing order Ai < A 2 < ■ ■ • < A n , 
we have L(G) v\ = 0. The eigenvalue Ai = is al- 
ways the smallest one and has the normalized eigenvector 
t>i = (1, 1, • • • , \)/ y/n. Since the matrix L(G) is singular, 
it has no inverse, but in such cases it is possible to intro- 
duce the so-called generalized inverse (L^ ) of L according 



22] 



to Moore and Penrose's definition 

Among many properties for the second smallest eigen- 
value A 2 (G], known as the algebraic connectivity, we re- 
call that diLlH A 2 (G) = iff G is not connected. For 
connected networks, the eigenvector components of the 
first non-null eigenvalue (A 2 ) has been applied as an ap- 
proximate method for grouping vertices into communities 
HIM HI. However the success in partitioning depends 
on how well A 2 is separated from other eigenvalues. 

From now on we identify the graph G = (V, E) with 
an electrical network connected by edges of unit resis- 
tances [111 H||. A random walk on G is a sequence of 
states (vertices) chosen among their adjacent neighbors. 
To describe the overall behavior of a walker on G, one 
needs to go beyond the usual analysis of Markov chains 
with transition matrix P^ , probability to go from ver- 
tex i to an adjacent vertex j, to include also hops, i.e., 
moves across the graph. For this end, we evaluate the ef- 
fective resistances between all distinct vertices i and 
j of G. Those effective resistances can be numerically 



evaluated by means of the electrical network theory as 

MM 



(L+)« + - (L% - (it) 



(2) 



for i ^ j and r%j = for i = j. Here, LT(G) is the 
Moore-Penrose generalized inverse of the Laplacian ma- 
trix L(G). Its definition amounts to write LT(G) as 
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This leads to a simple formulation of the effective resis- 
tances between all pairs of vertices as a function of the 
eigenvalues and eigenvectors of L(G), 
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As a natural generalization, it is convenient to define the 
effective conductances Cij for all pairs of vertices as 
Cij = 1/nj, for i ^ j. 

As a consequence of the above results it is possible 
to extend the usual random process that moves around 
through adjacent states i and j to hops on the graph. 
We define the hop transition probability from vertex i to 
any vertex j by 



(5) 



where Cy is the effective conductance from i to j and 
c, = ■ . Since a connected network is considered, 
the probability that a walker who begins the run at any 
given vertex i and reaches any other given vertex does 
not vanish. 



3. METHOD 

Although Py is not necessarily equal to Pji, it is possi- 
ble to describe hierarchical classes of states perceived by 
the walker as follows. 

Firstly, we consider the generalized "distance" expres- 
sion, 



ik 



1/9 



n-2 



(6) 



where q is a positive real number, as a similarity mea- 
sure between any vertices. Small dM 1 would imply high 
similarity between i and j and could be used to set a hier- 
archical classification. Unfortunately this measure does 
not provide a good score to classify those states into com- 
munities. We have realized that the fluctuations Sij in 
\Pik — Pjk\ indeed play the main role for that classifica- 
tion. Let us take q = 1 and define 



dij — 



En 
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\Pk ~ P 



(7) 
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as the average "distance" between i and j . The standard 
deviation between those vertices is given by 



Sij — 



1 " 

— J2 (\Pik-p jk \-d(ij)y 



n — 3 



1/2 



(8) 



As a matter of fact, this quantity gives a better descrip- 
tion of the similarity among the vertices in opposite to 
the average value in Eq. ([6]). The importance of those 
fluctuations to classify vertices into communities may be 
surmised saying that we should not ask how far away two 
vertices are, but who are their neighbors. 

Secondly, we explore the behavior of Py because low 
transition probability to go from state i to j means 
that state j is less accessible from state i. On the 
other hand, high transition probability among states de- 
fines a class of easily connected states. This is bet- 
ter understood in terms of l/Py. Since the elements 
Pij are not necessarily symmetric, we define how close 
i and j are by taken as distance min{l/Pjj, 1/Pji} — 
l/max{Py ;, Pji} = l/PV???. In other words, the quan- 
tity l/P^ljf sets different levels of transient classes on 

g(y,e). 

Thirdly, in order to have a well defined class of states 
we should expect small transition probability for leav- 
ing it. Let us also introduce the notation Pun = 
min{Py , P^}. Thus, a large value of A l3 = P^f - P™f 
is consequence of small value for the leaving probability 
Pr mi x n and large value for P^ff . 

Therefore, we extract the desired hierarchical analysis 
defining heuristically a similarity matrix (or "distance 
matrix" ) D taken simultaneously into account the above 
remarks: 



Da 



S, 



max{A y ,P { ™"} 



rtrnax 



(9) 



Comparative values of P{™" 1 , for different (i, j) pairs, may 
be translated as a penalty when they are rather large, 
which has an intimate connection with Ay. Thus, the 
maximum between Ay and P^y enters in the nominator 
of Eq. ([9]) as an extra term to help to set a similarity (or 
proximity) scale. As we will show in the next sections, the 
symmetric matrix D is able to unveil the entire transient 
classes of states. 



4. EVALUATING COMMUNITY 
IDENTIFICATION 

To understand the meaning of those transient classes 
we investigate in some examples the structure of G(V, E) 
encoded by the similarity matrix. Our analysis reveal 
well-defined classes of vertices. They occur at different 
levels of the hierarchical tree under with the interest- 
ing interpretation of communities i.e., with the structure 
of well-defined subnetworks. 



A. Performance on artificial community graphs 



Before discussing a particular issue on how to im- 
plement the algorithm we report its performance on 
graphs with a well known fixed community structure [l4| . 
Our method was tested on large number of graphs with 
n = 128 vertices and designed to have four communities 
of 32 vertices. Each graph is randomly generated with 
probability pi n to connect vertices in the same commu- 
nity and probability p out to those vertices in different 
communities. Those probabilities are evaluated in order 
to make the average degree of each vertex equals to 16. 
The test amounts to evaluate the fraction of vertices cor- 
rectly classified as a function of z Q ut > the average number 
of edges a given vertex has to outside of its own commu- 
nity. Our algorithm classifies correctly vertices into the 
four communities for small values of z ou t, decreasing its 
performance towards z out = 8. We have, for example, the 
fractions 0.99±0.01, 0.95±0.01, 0.81±0.02, 0.57±0.03, 
respectively for z out = 5,6,7 and 8. The error bar was 
evaluated over 100 randomly generated graphs. Those 
results are competitive with the analyzed algorithms in 
Ref. [HI . Moreover, we stress that the proposed method 
is fully parameter independent. Also, its computational 
cost is limited to methods in computing the eigenval- 
ues and eigenvectors of symmetric matrices. In general 
it amounts to initial 0(n 3 ) operations, with subsequent 
less expensive iterations 0(n 2 ). 



B. A graph with leaves 

The method is quite simple and much of the computer 
time is spent in calculating the eigenvalues and eigenvec- 
tors of L. All that remains to calculate is the effective 
resistances in Eq. (01 and, with the elements Py, the 
final similarity matrix D in Eq. However, some care 
is needed when the graph presents what we call leaves. 
This is explained as follows. 

We present in Fig. 1 a small graph to display the infor- 
mation contained in the matrix D and how to perform 
the hierarchical analysis. This example shows a graph 
containing a subgraph with tree-like topology. A tree is 
a connected acyclic graph. In this example, the tree is the 
subgraph with vertex numbers 5, 6 and 7, which we call 
leaves. Their effective resistances are 7-56 = r$j = = 1 
and therefore we have — = 2. For tree-like sub- 
graphs the effective resistances correspond to the num- 
ber of edges £y connecting vertices i and j. Therefore, 
r ij = f° r acyclic branches. Also — 1 because 
there is only one way of reaching vertex 8 from vertex 
4. On the other hand, whenever we have different paths 
joining adjacent vertices (?,j), we obtain ry < 1 as con- 
sequence of calculating the effective resistance of resis- 
tors connected in parallel and in series. For example, 
r 89 = r 8(io) — r 9(io) — 0.6667. To unveil the hierarchi- 
cal structure of graphs with leaves, we need to proceed as 
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FIG. 1: A simple graph with a tree-like subgraph: vertices 
5, 6 and 7. Our graph figures are drawn using VISONE 
(www.visone.de) . 



follows because well-defined transient classes of states are 
only identified for graphs with no local tree-like topology. 
Suppose we start with a graph with m vertices (m = 10). 
If the graph has leaves, we collect leaf after leaf to remove 
acyclic branches and we end up with a reduced number 
of vertices n (n < m). After collecting all leaves, we 
work with the Laplacian matrix of order n obtained from 
the reduced adjacency matrix. During this process we 
keep trace of the original labels. The hierarchical struc- 
ture of this example is shown in Fig. 2 as a dendrogram 
where we have joined the previously removed vertices (6, 
7 and 5) to vertex 3 because they naturally belong to the 
same community as vertex 3 does. All presented dendro- 
grams have their similarity (y-axis) D scaled to be in the 
range (0, 100). This allows a comparative display of their 
branches. 



FIG. 2: The community structure of graph in Fig. 1 
is depicted as a hierarchical tree or dendrogram with 
the complete linkage method for the similarity ma- 
trix D. Our dendrogram figures are drawn with the 
data plotting packa ge and programming language R 
I http: / /www.R-project .org| ) . 
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FIG. 3: The karate club network studied by Zachary. In- 
dividual numbers represent the members of the club and 
edges their relationships as observed outside the normal 
activities of the club. Squares and circles indicate the 
observed final splitting of the karate club into two com- 
munities led by the administrator (34) and the instructor 
(1). A clear further splitting is identified with shaded 
circles. 



analyze two well known networks in the literature. 

The first example (Fig. 3) corresponds to the network 
of members of the karate club studied by Zachary [lj|. 
This graph contains a single leaf: member 12. Our anal- 
ysis led to the hierarchical structure shown in Fig. 4 
by means of a hierarchical clustering tree, defining com- 
munities at different levels. The two main communities 
reproduce exactly the observed splitting of the Zachary 
club and studied by different community finding tech- 
niques [H, [H, Q [lE H3, El, [H, H3] - Interestingly, a 
smaller community presented by the hierarchical tree can 
be clearly identified in Fig. 3. It consists of members 
displayed with shaded circles. This small group is only 
influenced by its members and has a direct interaction 
with the instructor. 




C. Zachary karate club network 

To illustrate further the meaning of transient classes 
on G(V, E) from global information carried out by D we 



FIG. 4: The hierarchical structure of network in Fig. 
3 is shown as a dendrogram with the complete linkage 
method. It correctly identifies the two main communities 
of the karate club. 
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FIG. 5: The deterministic hierarchical scale- free model 
with n = 5 vertices proposed by Ravasz el al. [fj]. It is 
built by generating replicas of the small 5-vertex module 
(a) shown at left side. 



D. Ravasz and Barabasi square hierarchical 
network 



The second example is shown in Fig. 5. It was designed 
by Ravasz and Barabasi [2(j as a prototype of hierarchi- 
cal organization we may encounter in real network with 
scale-free topology and high modularity. The main fig- 
ure is built with the module in (a). A similar figure but 
with more connections between vertices can be built with 
the module in (b). The study of Dij reveals community 
structures at different hierarchical levels in Fig. 6, re- 
spectively for the graphs generated with the modules (a) 
and (b). 

The hierarchical trees present similar structures, but 
the hierarchical levels in both figures clearly display dif- 
ferent network formation patterns. Moreover, the hier- 
archical formation pattern of G(V, E) with branches at 
different heights may be seen as a measure of how cohe- 
sive those subgroups are. The normalized scale for 
then can be used to also set degrees of cohesiveness re- 
lated to the community formation. 



5. WEIGHTS ON THE EDGES 

Our method also applies to graphs such that each edge 
has a positive real number, the weight of the edge. The 
structure of the graph is now represented by the cor- 
responding weighted adjacency matrix W. It assigns 
weight Wij > if and only if i and j are connected 
vertices and 0, otherwise. The concept of the Lapla- 
cian matrix extends directly to weighted edges, L(G) = 
E(G) — W(G), where En — Y^j=i w v IS the diagonal 
weighted matrix whose values are the total weight of the 
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FIG. 6: Hierarchical structure for the formation pattern 
of the network in Fig. 5. Dendrogram (a) refers to the 
network built with module (a) in Fig. 5 whereas den- 
drogram (b) refers to the graph built with module (b) in 
Fig. 5. 



edges adjacents to vertex i. Again, L(G) is a real sym- 
metric matrix where the row sums and the column sums 
are all zero. Thus, we have the same spectral properties 
as recalled to the particular case tu,j = I for all adjacent 
vertices i and j. Therefore, the method presented to un- 
weighted graphs extends naturally to weighted ones with 
no change in the algorithm. 



Performance on artificial community weighted 
graphs 



We have also verified the performance of this method 
on weighted graphs with fixed community structure [3l[ . 
Our test is performed on the same artificial graphs ran- 
domly generated as described in Section 4. A. The com- 
puter generated graphs have 128 vertices and are divided 
into four groups of 32 vertices. Here, edges among ver- 
tices are randomly chosen such that the average degree 
is fixed at 16. The test is performed for the most diffi- 
cult situation where z out — zi n = 8. That is, each vertex 
has as many adjacent connections to inside as to out- 
side its community. For each graph, we attach a weight 
w > 1 to the edges inside each community and keep the 
fixed weight 1 for those edges which lie between commu- 
nities. We evaluate again the fraction of vertices classi- 
fied correctly as a function of w. As w increases from 
the starting value 1, the weights enhance the community 
structure. This is clearly highlighted by our method. 
Our performance amounts to the following fractions of 
correctly classified vertices, 0.89, 0.94, 0.97 and 0.98, re- 
spectively for w = 1.4, 1.6, 1.8 and 2. The averages were 
calculated over 100 randomly generated graphs, with er- 
ror bars smaller than 0.01. 



6 



riflfTi 



1 



7. Its structure exhibits the formation of various commu- 
nities. For comparison with the results in [32| . we also 
pick out the four main groups. The study of the their 
members reveals an association mainly according to race 
and gender, as also found in Ref. [32|. However, there 
are some differences in the members identification in each 
group. This may be due to the fact we are not analyz- 
ing exactly the same weighted network: our network is 
made undirect throught out an average process while the 
original one was handled in its original directed form. 



FIG. 7: Network community of professional discussions 
among teachers at "Our Hamilton High" . 



B. identifying cohesive subgroups 

As an example, we apply our method to the problem 
of analyzing weighted interactions related to verify how 
pairs of teachers are engaged in professional discussions 
[32|. This is a social network with n = 24 members. 
Their edges are characterized by the professional discus- 
sions in a high school, called "Our Hamilton High" , dur- 
ing the 1992-1993 school year. Teachers were asked to 
list and weight the frequency of their discussions in that 
school to at most five teachers. This way of attributing 
weights leads to a directed network. The weights should 
follow a scale running from 1, for discussions occuring 
less than once a month, to the largest weight value 4, for 
almost daily discussions 32] . Every vertex number con- 
tains characteristics of teachers as gender, race, subject 
field, room assignment, among others. To perform our 
analysis we have defined the weights to each edge as the 
average of the values placed on the edges in the origi- 
nal directed network. Thus, this new weighted network 
is characterized by edges with real values in the range 
(0.5,4) as representing the interactions among the mem- 
bers of that school. The community structure revealed 
by our analysis is represented by the dendrogram in Fig. 



6. CONCLUSIONS 

In conclusion, random walks on graphs in connection 
with electrical networks highlight a topological property 
of G(V,E): transient classes of vertices which we inter- 
pret as communities in the original graph. Here we em- 
phasize that those special classes of vertices are a direct 
consequence of effective transition probabilities, which 
display a global perspective about the map of interac- 
tions that characterize the graph. We demonstrate its 
high performance in identifying community structures in 
some examples which became benchmark for initial al- 
gorithm validation. Moreover, it is parameter tunning 
independent. Our criterion to define communities de- 
pends only on G(V, E) and not on any explicit definition 
of what a community structure must be. 

It is likely that our proposed algorithm may produce 
new insights for large graphs. Application examples 
may include protein-protein interactions and the com- 
partment identification in food-web structures. The vi- 
sual information about how members form communities 
along the hierarchical tree may permit understand and 
characterize cohesive communities. 
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